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Abstract. Research on the growth of online tagging systems not only is interesting in its own right, but also 
yields insights for website management and semantic web analysis. Traditional models that describing the 

i— >; 

^ . growth of online systems can be divided between linear and nonlinear versions. Linear models, including the 

BA model (Brabasi and Albert, 1999), assume that the average activity of users is a constant independent 
of population. Hence the total activity is a linear function of population. On the contrary, nonlinear models 

function of population. In the current study, supporting evidences for the nonlinear growth assumption 
" are obtained from data on Internet users' tagging behavior. A power law relationship between the number 

of new tags (F) and the population (P), which can be expressed as F ~ P 1 (7 > 1), is found. I call this 

O 1 

pattern accelerating growth and find it relates the to time-invariant heterogeneity in individual activities. 
I also show how a greater heterogeneity leads to a faster growth. 



PACS. XX.XX.XX No PACS code given 

K 

5~ . 1 Introduction degree is a constant parameter. On the contrary, nonlinear 

models argue that the size of the population has an effect 
In the research literature, there are two kinds of models Qn individual ac t iv ities 0, which leads to a nonlinear re- 
describing the growth of online social systems: linear and lationship bctwee n F and P. The assumption of nonlinear 
nonlinear. Linear models suggest that the expected value growth ig supported by findings in different online activi- 
of individual activities M is a constant independent of ac- tieg including game playing [3] ; res0 urce recommendation 
tive population P. As a consequence, the total amount of ^ collaborative programming [5] and tagging 0. More- 
activity F = MP is a linear function of P. For example, in ^ evidence on the nominear nature of human collective 
the BA model for social network evolution [T] , the average 
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behavior is also seen in many offline systems, such as cities 
and countries [7] [5] , academic groups [S] , sexual networks 
[TU] . and so on. 

In the current study, I investigate two online tagging 
systems and obtain supporting evidence for the nonlin- 
ear growth assumption. I find that within a tagging sys- 
tem, the number of new tags F is a power law func- 
tion of the active population P with a constant exponent 
7 despite the daily fluctuation of tags and population. 
Since 7 > 1 means that the number of new tags grows 
faster than the population does, the pattern can be named 
as "accelerating growth" [2] [11] [T2] . It is observed that, the 
growth rate 7 positively relates to the heterogeneity in in- 
dividual tagging activities 1//3, in which /3 is the exponent 
of power law distribution of individual activities (Eq.3). 
The heterogeneity 1//3 is found to remain constant over 
time within a system, but differs across systems. 

This paper is organized as follows. In section[2J I intro- 
duce the empirical accelerating growth patterns in two on- 
line tagging systems. In section [3j I show how the growth 
rate 7 relates to the heterogeneity of individual activities 
1//3, and validate their relationship by empirical data and 
simulation. Finally, I briefly summarize the findings and 
discuss possible applications of accelerating growth in the 
informational industry. 

2 Accelerating growth of online tagging 




• Flicker 7=1 .39 R =0.94 

• Delicious 7=1.18 R 2 =0.80 



12 



Log population 

Fig. 1: Accelerating growth in online tagging systems. Dif- 
ferent datasets are marked in points of different colors 
(blue for Flickr and green for Delicious). The x axis is the 
log of active population in a day and the y axis is the log of 
total activity in the day. The orthogonal log-log regression 
lines are also shown. 

I investigate the tagging activities of 77,917 users on 
Flickr and 5,147 users on Delicious. The data from Flickr 
and Delicious are collected by systematically crawling the 
record of all active users [13] ■ In the two systems, I define 
P as the number of active users in a day, and F the number 
of new tags generated by the users. Please refer to Table [1] 
for the detailed information of the two systems. 

The power law relationships between F and P in two 
systems, which can be expressed as 



systems 



F ~ P~< 



(1) 
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Table 1: The empirical values of accelerating growth rate in online tagging systems. 
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Activity 


System 


7 





95% CI of 7 


Adjusted R 2 


N of Days 


Web Address 


Book-tagging 


Delicious 


1.18 


0.18 


[1.11, 1.31] 


0.80 


171 


delicious.com 


Photo-tagging 


Flickr 


1.39 


0.39 


[1.32, 1.46] 


0.94 


120 


flickr.com 



or 

Log(F) = jLog(P) (2) 

are found. Orthogonal regression [TJ] is used to esti- 
mate the value of 7 in eq. (J2|). We use orthogonal regres- 
sion but not ordinary least-squares regression because the 
latter tends to overstate the effect of outliers (data points 
with large variance) . It is observed that the empirical val- 
ues of growth rate 7 in both systems are greater than 1. 
This finding of accelerating growth is consistent with re- 
sults of previous studies [3] [I] [5] [B] • According to eq. (JTJ) > it 
is easy to know that there is also a power law relationship 
between the average number of tags M posted by a user 
and the population, since M = F/P ~ P 7 " 1 ~ P e . As 
7 > 1 and 9 > in the data (Table [I}, the average num- 
ber of tags is not a constant; instead, it increases with 
population. 

Comparing the 9 in data with those reported in previ- 
ous studies sheds lights on the accelerating nature of hu- 
man collective behavior [E] [15] . The phenomenon that 
average activities (e.g., average salary or average walking 
speed of pedestrians) increase with city population has 
been well studied [7][T5J, and is used to explain the dif- 
ferent paces of urban and rural life. Similar patterns also 
include GDP per capita that increases with country size [8] 
and the average length of references in publications that 



increases with the scale of scientific collaboration network 
[5]. However, most of the 9 observed offline are not greater 
than their counterparts online. For instance, the average 
activity usually scales to the system size with a 9 range 
from to 0.35 in offline systems [7] [5] whereas 9 found in 
my study can be greater than 0.38 (in Flickr). Given the 
population of a system, a larger 9 means higher produc- 
tivity in generating activities. Therefore we are reasonable 
to conjecture that online social systems, if are utilized ap- 
propriately, may be more "productive" than offline ones. 

Then, why is there accelerating growth and what de- 
termines the value of 9 (or 7 )? Although several models 
have been proposed to answer the questions [T()] | 17 | [T8"]. 
there is still lack of a unified framework explaining the 
origin of accelerating growth. In the current study, I find 
that the accelerating growth relates to the stable (or time- 
invariant) heterogeneity in individual activities. In the 
next section, I will analytically show how heterogeneity 
gives rise to accelerating growth. Please note that this 
study does not aim to give an exclusive and unified frame- 
work towards the origins of all accelerating growth pat- 
terns, although I hope my model may contribute to such 
a framework in one way or another. 
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3 Heterogeneity and accelerating growth power law probability distribution function (PDF) of / be 

expressed as 

Many models have been proposed to explain the nonlin- 
ear growth phenomenon [16] [17] [18] [19] , but few can be 



n(f) = Ctf-f (3) 

generalized across different fields of study. Some ecolo- 

gists suggest that nonlinear growth originates from the In which C t varies over time. In Eq.3 the value of /3 
self-similar network structure of biological individuals [T7] is set to be greater than one, since the cumulative distri- 
or metabolic networks that minimize transportation cost bution function (CDF) of power law distribution (Eq.3), 
[TS]. Scholars in linguistics suggest that a skewed distri- called Pareto distribution [2D], is also a power law distri- 
bution of word occurrence leads to non-linear growth in bution with an exponent -a = 1 — p < 0. 
word vocabulary [16]. Similar ideas are also seen in studies In Figure 2, I draw 3 samples of daily distributions 
on the growth of family name [TD] and open source soft- of individual activities on log-log axes for both systems, 
ware [5]. In the current research, I analytically attribute It is observed that, when population increases, the distri- 
accelerating growth of online tags to the time-invariant buttons move toward the right hand in parallel. Hence, 
heterogeneity of individual tagging activities in the sys- 1 conjecture that C t in eq. ©, which determines the in- 
tern. The heterogeneity is quantified as the 1//3, in which tercept of the regression line, increase with system size, 
/3 is the exponent of power law distribution of individual but f3, which decides the slope of the line, is a size invari- 
activities. The positive correlation between heterogeneity ant constant. In other words, all the daily distributions 
1 IP and accelerating growth rate 7 is verified by empirical should collapse to one theoretical curve if the variance of 
data and numerical simulations. system size is controlled. I call the theoretical curve time- 
invariant power law distribution, since its slope does not 
3.1 Time-invariant power law distribution in online change over time. 

tagging systems What makes the system maintain a time-invariant power 

law distribution in the temporal evolution? This is a ques- 
tion calling for further exploration. However, this phe- 

To find out why the average number of tags increases nomenon is non-trivial, since it means that the heterogene- 

with population, I explore the daily distributions of user ity of the system does not change over time, which will give 

tagging activities. It turns out the distributions approach rise to accelerating growth. But before I discuss the con- 

a straight line in log-log axis, suggesting a model of power nection between heterogeneity and accelerating growth, I 

law distribution. I use / to denote number of tags and n(f) would like to derive the function of time-invariant distri- 

the number of users that generate so many tags. Then the bution at first. 
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Flickr 



Delicious 




* Apr.25.2005 

• Mar.06,2005 
Jan. 15,2005 



2 3 

Log of number of tags 




Jun.30,2004 
Mar.1 0,2004 
Feb.01 ,2004 



(a) 



1 

2 

Log of number of tags 



(b) 



Fig. 2: Three examples of daily distributions of user tagging activities. The x axis is the log of number of tags and the 
y axis is the log of number of users. The rescaled distributions and the theoretical power law models with j3 = 1.41 
(Flickr) and ft = 1.58 (Delicious) are shown in the insets. 



As in tagging systems, there is usually only one user 
who generates the maximum activity, that is 



Table 2: Parameters in time- invariant power law model of 
online tagging systems. 



n(f max ) = 1 « C t f~P x ^C t = f, 
With eq. (j4]), we can rewrite eq. ([3]) as 





max 



(4) 



System 


P 


95% CI of p 


Adjusted R 2 


N of Days 


Delicious 


1.58 


[1.57, 1.59] 


0.90 


171 


Flickr 


1.41 


[1.40, 1.41] 


0.90 


120 



(5) shown in Table [21 the assumption about time-invariant 
distribution is plausible. 



eq. ([5]) is the function of time-invariant power law dis- 
tribution. As mentioned, all empirical daily distributions 
are supposed to collapse to the curve predicted by eq. ([S]) 
after rescaling. Therefore in both systems under study I Given a distribution function, the relationship between 



3.2 From power law distribution to accelerating growth 



rescale daily distributions together before estimating the the population P — J n(f)df and the total number of 
values of /3 . According to the large values of adjusted R 2 tags F = J n(f)fdf is fully predictable. With the time- 
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Fig. 3: The relationship between heterogeneity (1//3) and 
growth rate (7). The analytical function is draw in pur- 
ple line, the results of numerical simulations are plotted 
in small dots whose color varies with C, and the empir- 
ical results of two tagging systems are marked in green 
(Delicious) and blue (Flickr) dots with large size. In the 
numeric simulation based on power law PDF f(x) — 
C^x-^iP - 1)), C is an integer in 1 < C < 10, and 
(3 is a fraction in (1 < f3 < 10). For each value of C, 40 
different values of /3 are selected. Therefore, there are 400 
small dots in the figure. 

invariant power law distribution shown above eq. ([5]), the 
time-invariant relationship between P and F can be de- 
rived as follows. 

As F max ^ 1 in empirical data, 

When > 2, 

tP tP 

J max J max 



F = J maX n(f)fdf= j 



1-/3 1-/3 (3-1 
tP tP 

<> max ^ , ■' m ax 



(6) 



(7) 



2-/3 2-/3 P-2 

Therefore F ~ P, 7 w 1, 6 w 0. The greater /3 is, the more 
closely 7 approaches to 1. 



When /3 = 2, 

rf mM 



P 



n(f) «/, 



2 

max 



n(f)f =F (8) 



Therefore F - P, 7 w 1, 6» sa 0. 

Similarly, when 1 < /3 < 2, we can derive that 

f P 

p ^ J max 

f 2 

J max 

2-/3 
J ~ 2 



F 



(9) 



(10) 



Therefore P~P^,7« §,#~ | — 1- And smaller /3 leads 



to greater 7. 

To sum up, we get 



2/p if 1< /3 < 2 



7 = 



(11) 



1 if /3 > 2 

According to eq. (|lip . if a system maintains a time- 
invariant power law distribution with an exponent (3 < 2, 
it will show property of accelerating growth. As power 
law distribution is famous for its high heterogeneity (i.e., 
quantitative difference in individual activities in the sys- 
tem), and smaller (3 means higher heterogeneity, we can 
regard 1//3 as an index for heterogeneity and claim that 
greater heterogeneity gives rise to a faster growth (greater 
7). As shown in Fig. [3] and Tabled the theoretical func- 
tion of 7 on 1//3 (eq. (jlip ) is consistent with empirical 
data and simulation results. 

4 Discussions 

In this paper, I discuss the accelerating growth of two on- 
line tagging systems, noting that it is the time-invariant 
heterogeneity of individual tagging activities that gives 
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Table 3: Comparison between empirical (7) and theoreti- 
cal (7') accelerating growth rate 



in eq. ([3]) is only determined by system size? These are all 
interesting questions worth further exploration. 



System /3 7' = 2//3 95% CI of 7 
Delicious 1.58 1.27 [1.11, 1.31] 
Flickr 1.41 1.42 [1.32, 1.46] 



rise to such a accelerating growth. Further, I analytically 
derive the function of accelerating growth rate 7 on het- 
erogeneity 1//3 , which is verified by empirical data and 
numerical simulations. 

The findings of this paper support non-linear models 
of collective human behavior, which has been widely ac- 
knowledged by previous studies [2] [3] [4] [5] [6] . Moreover, 
the relationship between heterogeneity and accelerating 
growth may not only exist in online tagging systems, but 
also can be found in other online social systems, or even 
offline social systems. 

Accelerating growth, once proved to be widely exist in 
online social systems, can find its applications in website 
management or semantic web analysis. For instance, by 
analyzing web traffic data, we can predict the traffic on 
a website with a given active population. Hence this pat- 
tern will help website masters plan their web server capac- 
ity accordingly. Also, we can compare 7 between websites 
with equivalent functions and benchmark the most effi- 
cient one for other websites. 

Some questions left points out the direction for the fu- 
ture studies. For example, why can online tagging systems 
maintain a time-interval power law distribution? Why Ct 
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